ETSITS126 191 vg.o.o 



(2010-01) 



Technical Specification 

Digital cellular telecommunications system (Phase 2+); 
Universal Mobile Telecommunications System (UMTS); 

LTE; 

Speech codec speech processing functions; 

Adaptive Multi-Rate - Wideband (AMR-WB) speech codec; 

Error concealment of erroneous or lost frames 

(3GPP TS 26.191 version 9.0.0 Release 9) 



33i^ 





3GPP TS 26.1 91 version 9.0.0 Release 9 1 ETSI TS 1 26 1 91 V9.0.0 (201 0-01 ) 



Reference 



RTS/TSGS-0426191V900 
Keywords 



GSM, LTE, UMTS 



ETSI 

650 Route des Lucioles 
F-06921 Sophia Antipolis Cedex - FRANCE 

Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 1 6 

Siret N ° 348 623 562 0001 7 - NAF 742 C 
Association a but non lucratif enregistree a la 
Sous-Prefecture de Grasse (06) N° 7803/88 



Important notice 



Individual copies of the present document can be downloaded from: 
http://www.etsi.orq 

The present document may be made available in more than one electronic version or in print. In any case of existing or 

perceived difference in contents between such versions, the reference version is the Portable Document Format (PDF). 

In case of dispute, the reference shall be the printing on ETSI printers of the PDF version kept on a specific network drive 

within ETSI Secretariat. 

Users of the present document should be aware that the document may be subject to revision or change of status. 

Information on the current status of this and other ETSI documents is available at 

http://portal.etsi.orq/tb/status/status.asp 

If you find errors in the present document, please send your comment to one of the following services: 

http://portal.etsi.orq/chaircor/ETSI support.asp 

Copyright Notification 

No part may be reproduced except as authorized by written permission. 
The copyright and the foregoing restriction extend to reproduction in all media. 

© European Telecommunications Standards Institute 2010. 
All rights reserved. 

DECT™, PLUGTESTS™, UMTS™, TIPHON™, the TIPHON logo and the ETSI logo are Trade Marks of ETSI registered 

for the benefit of its Members. 
3GPP™ is a Trade Mark of ETSI registered for the benefit of its Members and of the 3GPP Organizational Partners. 

LTE™ is a Trade Mark of ETSI currently being registered 

for the benefit of its Members and of the 3GPP Organizational Partners. 

GSM® and the GSM logo are Trade Marks registered and owned by the GSM Association. 



ETSI 



3GPP TS 26.1 91 version 9.0.0 Release 9 2 ETSI TS 1 26 1 91 V9.0.0 (201 0-01 ) 



Intellectual Property Rights 



IPRs essential or potentially essential to the present document may have been declared to ETSI. The information 
pertaining to these essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found 
in ETSI SR 000 314: "Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in 
respect of ETSI standards", which is available from the ETSI Secretariat. Latest updates are available on the ETSI Web 
server ( http://webapp.etsi.org/IPR/home.asp ). 

Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee 
can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web 
server) which are, or may be, or may become, essential to the present document. 



Foreword 

This Technical Specification (TS) has been produced by ETSI 3rd Generation Partnership Project (3GPP). 

The present document may refer to technical specifications or reports using their 3GPP identities, UMTS identities or 
GSM identities. These should be interpreted as being references to the corresponding ETSI deliverables. 

The cross reference between GSM, UMTS, 3GPP and ETSI identities can be found under 
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Foreword 



This Technical Specification has been produced by the 3GPP. 

The present document defines an error concealment procedure, also termed frame substitution and muting procedure, of 
the wideband telephony speech service employing the Adaptive Multi-Rate - Wideband (AMR-WB) speech coder 
within the 3GPP system. 

The contents of the present document are subject to continuing work within the TSG and may change following formal 
TSG approval. Should the TSG modify the contents of this TS, it will be re-released by the TSG with an identifying 
change of release date and an increase in version number as follows: 

Version x.y.z 

where: 

X the first digit: 

1 presented to TSG for information; 

2 presented to TSG for approval; 

3 Indicates TSG approved document under change control. 

y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, 
updates, etc. 

z the third digit is incremented when editorial only changes have been incorporated in the specification; 
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Scope 



This specification defines an error concealment procedure, also termed frame substitution and muting procedure, which 
shall be used by the AMR-WB speech codec receiving end when one or more erroneous/lost speech or lost Silence 
Descriptor (SID) frames are received. 

The requirements of this document are mandatory for implementation in all networks and User Equipment (UE)s 
capable of supporting the AMR-WB speech codec. It is not mandatory to follow the bit exact implementation outlined 
in this document and the corresponding C source code. 



Normative references 



The following documents contain provisions which, through reference in this text, constitute provisions of the present 
document. 

• References are either specific (identified by date of publication, edition number, version number, etc.) or 
non-specific. 

• For a specific reference, subsequent revisions do not apply. 

• For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including 
a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same 
Release as the present document. 

[1] 3GPP TS 26.202"AMR Wideband Speech Codec; Interface to RAN". 

[2] 3GPP TS 26.190"AMR Wideband Speech Codec; Transcoding functions". 

[3] 3GPP TS 26. 193"AMR Wideband Speech Codec; Source Controlled Rate operation". 

[4] 3GPP TS 26.201"AMR Wideband Speech Codec; Frame structure". 



3 Definitions and abbreviations 

3.1 Definitions 

For the purposes of this document, the following definition applies: 

N-point median operation: Consists of sorting the N elements belonging to the set for which the median operation is 
to be performed in an ascending order according to their values, and selecting the (int (N/2) + 1) -th largest value of the 
sorted set as the median value. 

Further definitions of terms used in this document can be found in the references. 

3.2 Abbreviations 

For the purposes of this document, the following abbreviations apply: 

AMR-WB Adaptive Multi Rate - WideBand 

AN Access Network 

BFI Bad Frame Indication from AN 

BSI_netw Bad Sub-block Indication obtained from AN interface CRC checks 

prevBFI Bad Frame Indication of previous frame 

RX Receive 

SCR Source Controlled Rate (operation) 

SID Silence Descriptor frame (Background noise) 

CRC Cyclic Redundancy Check 
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ECU Error Concealment Unit 

BFH Bad Frame Handling 

medianN N-point median operation 



4 General 

The purpose of the error concealment procedure is to conceal the effect of erroneous/lost AMR-WB speech frames. The 
purpose of muting the output in the case of several erroneous/lost frames is to indicate the breakdown of the channel to 
the user and to avoid generating possible annoying sounds as a result from the error concealment procedure. 

The network shall indicate erroneous/lost speech or lost SID frames by setting the RX_TYPE values [3] to 
SPEECH_BAD, SID_BAD or SPEECH_LOST. If these flags are set, the speech decoder shall perform parameter 
substitution to conceal errors. 

The example solution provided in paragraph 6 apply only to bad frame handling on a complete speech frame basis. Sub- 
frame based error concealment may be derived using similar methods. 



5 Requirements 

5.1 Error detection 

If the most sensitive bits of the AMR-WB speech data (class A in [4]) are received in error, the network shall indicate 
RX_TYPE = SPEECH_BAD in which case the BFI flag is set. When the frame is not received, the network shall 
indicate RX_TYPE = RX_SPEECH_LOST in which case the BFI flag is set as well. If a SID frame is received in error, 
the network shall indicate RX_TYPE = SID_BAD.. 

5.2 Erroneous or lost speech frames 

Normal decoding of erroneous/lost speech frames would result in very unpleasant noise effects. In order to improve the 
subjective quality, erroneous/lost speech frames shall be substituted with either a repetition or an extrapolation of the 
previous good speech frame(s). This substitution is done so that it gradually will decrease the output level, resulting in 
silence at the output. Subclause 6 provides example solution. 

5.3 First lost SID frame 

A lost SID frame shall be substituted by using the SID information from earlier received valid SID frames and the 
procedure for valid SID frames be applied as described in [3]. 



5.4 Subsequent lost SID frames 



For many subsequent lost SID frames, a muting technique shall be applied to the comfort noise that will gradually 
decrease the output level. For subsequent lost SID frames, the muting of the output shall be maintained. Subclause 6 
provides example solutions. 



Example ECU/BFH Solution 



6.1 State Machine 

This example solution for substitution and muting is based on a state machine with seven states (Figure 1). 
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The system starts in state 0. Each time a bad frame is detected, the state counter is incremented by one and is saturated 
when it reaches 6. Each time a good speech frame is detected, the state counter is right-shifted by one. The state 
indicates the quaUty of the channel: the larger the value of the state counter, the worse the channel quality is. The 
control flow of the state machine can be described by the following C code (BFI = bad frame indicator. State = state 
variable): 

if(BFI != ) 

State = State + 1 ; 

if(State > 6) 
State = 6; 
else 

State = State » 1 ; 

In addition to this state machine, the Bad Frame Flag from the previous frame is checked (prevBFI). The processing 
depends on the value of the State- variable. In states and 6, the processing depends on the BFI flag. 
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The procedure can be described as follows: 
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STATE = 

BFI = 
PrevBFI = or 1 
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/\ 



tiTATE = l 

(BFI, prevBFI) = 

(1,0) or (0,1) or 
(0,0) 



I 



<■ 



STATE = 2 

(BFI, prevBFI) = 
(1,1) or (1,0) or 
(0,1) 



I 



STATE = 3 

(BFI, prevBFI) = 

(1,1) or (1,0) or 
(0,1) 



I 
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STATE = 4 

BFI=1 
prevBFI = or 1 



I 



STATE = 5 

BFI=1 
prevBFI = 1 



I 



STATE = 6 

BFI=1 
prevBFI = 1 



J 



-► Bad frame (BFI=1) 



-^ Good frame (BFI=0) 



Figure 1 : State machine for controlling the bad frame substitution 

6.2 Substitution and muting of erroneous/lost speech frames 
6.2.1 BFI = 0, prevBFI = 0, State = or 1 

No error is detected in the received or in the previous received speech frame. The received speech parameters are used 
normally in the speech synthesis. The current frame of speech parameters is saved. 
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6.2.2 BFI = 0, prevBFI = 1 , State = to3 



No error is detected in the received speech frame but the previous received speech frame was bad. The LTP gain is used 
normally in the speech synthesis and fixed codebook gain are limited below the values used ft)r the last received good 
subframe: 



g'(n) 



where 



SLei.ed ^SLeived ^ lOOor g^^^^.^,^^ < ^^(h -l)Xl.25 

(1) 

1.25* ^"^(n-l) ,otherwise 



S received ~ ciuTent decodcd fixed codebook- gain 

g (n — V) = fixed codebook gain used for the last good subframe (BFI = 0) 

g (n) = fixed codebook gain to be used for the current frame. 

The rest of the received speech parameters are used normally in the speech synthesis. The current frame of speech 
parameters is saved. 

6.2.3 BFI = 1 , prevBFI = or 1 , State = 1 ...6 

An error is detected in the received speech frame and the substitution and muting procedure is started. 

6.2.3.1 LTP gain & fixed codebook gain concealment 

when RX_FRAMETYPE = SPEECH_BAD 

The LTP gain g'' and fixed codebook gain g ^ are replaced by attenuated values from the previous subframes: 
gP = P'' (state) * median5(g ''in- 1),..., g''(n- 5)) (2) 

\P'{state)*median5{g'{n-\),...,g'{n-5)) ,VAD_HIST<2 

g'' = { 

[median5(g' (n -1),..., g' (n -5)) ,VAD_HIST>2 



(3) 



where: 



s' 

* = current decoded LTP gain, 

g'' = current decoded fixed codebook gain, 

g''(n — 1),..., g''(n — 5) = LTP gains used for the last 5 subframes, 

g"^ (n — V),..., g'^ (n —5)= fixed codebook gains used for the last 5 subframes, 

median5() = 5-point median operation, 

P'' (state) = attenuation factor (P''(]) = 0.98, P'' (2) = 0.96, P'' (3) = 0.75, P'' (4) = 0.23, P'' (5) = 0.05, 

P''(6) = 0.01), 

P' (state) = attenuation factor (P' (1) = 0.98, P" (2) = 0.98, P' (3) = 0.98, P' (4) = 0.98, P" (5) = 0.98, 

P' (6) = 0.70), 
state = state number {0..6}, 
VAD_HIST is number of consecutive VAD=0 decisions. 

The higher the state value is, the more the gains are attenuated. Also the memory of the predictive fixed codebook gain 
is updated by using the average value of the past four values in the memory: 



1 r "* 

'•(0)=— / ener[n — i) 



eneryj) = — 2^ ener\n -i) - i (4) 

.M 
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6.2.3.2 LTP gain & fixed codebook gain concealment 

when RX_FRAMETYPE = SPEECH_LOST 

The LTP gain g'' and fixed codebook gain g ^ are replaced by attenuated values from the previous subframes: 
g'' =P'' (state)* median5(g''(n-l),...,g''(n-5)) (5) 

^ _ {P' (state)* median5(g' (n-l),..., g' (n-5)) ,VAD_HIST<2 

[median5(g'(n-l),...,g'(n-5)) ,VAD_HIST>2 



(6) 



where: 



* = current decoded LTP gain, 

g"^ = current decoded fixed codebook gain, 

g'' (n — V),..., g'' (n — 5) = LTP gains used for the last 5 subframes, 

g'^ (n — V),...,g' (n — 5)= fixed codebook gains used for the last 5 subframes, 

median5() = 5-point median operation, 

P'' (state) = attenuation factor (P''(]) = 0.95, P'' (2) = 0.90, P''(3 ) = 0.75, P^ (4) = 0.23, P^ (5) = 0.05, 

P'' (6) = 0.01), 

P' (state) = attenuation factor (P' (1) = 0.50, P" (2) = 0.25, P' (3) = 0.25, P" (4) = 0.25, P' (5) = 0. 15, 

P' (6) = 0.01), 
state = state number {0..6}, 
VAD_HIST is number of consecutive VAD=0 decisions. 

The higher the state value is, the more the gains are attenuated. Also the memory of the predictive fixed codebook gain 
is updated by using the average value of the past four values in the memory: 



ener 



(o)=i 



^ ener{n - i) 



(7) 



6.2.3.3 ISF concealment 

The past ISFs are shifted towards their partly adaptive mean: 

ISF^ (i) = a * past _ ISF^ (i) + (l-a)* ISF„ (i) i = 0.. 16(8) 

where 

a =0.9, 

ISF (i) is ISF- vector for a current frame, 

past _ ISF (i) is ISF- vector from the previous frame, 

ISF^^^^(i) vector is combination of adaptive mean and constant mean ISF- vectors in the following manner: 

ISF„ (i) = /3 * ISF^„„,,_„ (i) + (l-/3)* /5F„,„^,,„_.,„„ (0 , i=0..16 (9) 

where 
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P =0.75, 

1 ^ 

ISF^dap,ive_mean(i) = 'Y, P^st _ ISF^(i) and is Updated whenever BFI =0. 

•^ 1=0 

^^■^comi ni'an^^^ ^^ ^ vector Containing long time average of ISF-vectors. 



6.2.3.4 LTP-lag concealment 

The histories of five last good LTP-lags and LTP-gains are used for finding the best method to update. 
6.2.3.4.1 LTP-lag concealment when RX_FRAMETYPE = SPEECH_BAD 

The usability of the received LTP lag ( Q, ) is defined as follows: (Predicts if the received lag is most probably very 
close to one that was sent and therefore its usage should not introduce any bad artifacts) 

'1 , r,„ < 10 and T_ - 5 < r,,,,„,,, < T_ + 5 

1 ,gP(n-l)>0.5andgP(n-2)>0.5andT (n -1) -10 <r„,„,,,^ < T (n -1) + 10 

1 , ^L < 0.4 and g'{n-l) = g^^ and T_ < r„,,„,,, < T„,, 



^las ~ 



(10) 



1 ,r,„<70andT_<r__,<T„ 

1 T <T < T 

' mean received max 



, otherwise 



where: 



T{n — \) is LTP lag from the previous good frame, 

-* dif ~ \^ received ~ ^ V^ ~ ^)\ ^ 

7;.n = min(7;^„^^ J , 
T^ax = max(r,„^^J , 
Treceived ^^ received lag, 

^™n=min(^,V)' 

^ '' is LTP gain of the current frame, 

^'' (-1) is LTP gain of the previous good frame, 

g '' (-2) is LTP gain of the frame before previous good frame, 

T,nean^average{T,^jjJ 

LPT lag value for the current frame is defined as follows: 

T O =1 

received ' ^t. lag 



13' 

where: 

T^nax = max(r,„^^^J , 

T'max-i is second largest value in T^^^^^ . 
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T'max-ais second largest value in T^^g^^ , 
RND(x) is random value generated to range 



X X 



6.2.3.4.2 LTP-lag concealment when RX_FRAMETYPE = SPEECH_LOST 

The usability of the LTP lag from last good frame ( Qia„ ,_i ) is defined as follows: (Predicts if the received lag is most 
probably very close to one that was sent and therefore its usage should not introduce any bad artifacts) 



Qu 



_t-\ 



,gL>0.5andT„, <10 
,gP(n-l)>0.5andgP(n-2)>0.5 
, otherwise 



where: 



> min 



min(g,V)' 
^'' (n-1) is LTP gain of the previous good frame, 
g '' (n-2) is LTP gain of the frame before previous good frame 

LPT lag value for the current frame is defined as follows: 



T = 



T{n-\) 



S (^-x + T^n^-i + T^^-1 ) + RNDiT^^ - r__, ) 



' Vlag_t-1 
'^lag_t-l 



= 1 

= 



where: 



T{n — X) is LTP lag from the previous good frame, 

7;ax = max(r,„^^J , 

T'max-i is second largest value in T^^^^^ , 
T'max-ais second largest value in T^^g^^ , 



RND(x) is random value generated to range 



X X 

r*2 



(12) 



(13) 



6.2.4 Innovation sequence 

When RX_FRAMETYPE = SPEECH_BAD, the received fixed codebook innovation pulses from the erroneous frame 
are used as they are received. 

When RX_FRAMETYPE = SPEECH_LOST, the received fixed codebook innovation pulses from the erroneous frame 
are not used and the fixed codebook innovation vector is filled with random signal (values limited to 
range [-1, +1]). 

6.2.5 High-band gain (for 23.85 WbW/s mode) 

When RX_FRAMETYPE = SPEECH_BAD or RX_FRAMETYPE = SPEECH_LOST the received high-band energy 
parameter of the frame is not used and the estimation for the high-band gain is used instead. This means that in case of 
bad/lost speech frames, the high-band reconstruction operates in the same way for all the modes. 
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6.3 Substitution and muting of lost SID frames 

In the speech decoder a single frame classified as SID_BAD shall be substituted by the last valid SID frame information 
and the procedure for valid SID frames be applied. If the time between SID information updates (updates are specified 
by SID_UPDATE arrivals and occasionally by SID_FIRST arrivals) is greater than one second this shall lead to 
attenuation. 
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